SARS-CoV-2-positive patients display considerable differences in proteome diversity in urine, nasopharyngeal, gargle solution and bronchoalveolar lavage fluid samples

Proteome profile changes post-severe acute respiratory syndrome coronavirus 2 (post-SARS-CoV-2) infection in different body sites of humans remains an active scientific investigation whose solutions stand a chance of providing more information on what constitutes SARS-CoV-2 pathogenesis. While proteomics has been used to understand SARS-CoV-2 pathogenesis, there are limited data about the status of proteome profile in different human body sites infected by the SARS-CoV-2 virus. To bridge this gap, our study aims to characterize the proteins secreted in urine, bronchoalveolar lavage fluid (BALF), gargle solution, and nasopharyngeal samples and assess the proteome differences in these body samples collected from SARS-CoV-2-positive patients. We downloaded publicly available proteomic data from (https://www.ebi.ac.uk/pride/). The data we downloaded had the following identifiers: (i) PXD019423, n = 3 from Charles Tanford Protein Center in Germany. (ii) IPX0002166000, n = 15 from Beijing Proteome Research Centre, China. (iii) IPX0002429000, n = 5 from Huazhong University of Science and Technology, China, and (iv) PXD022889, n = 18 from Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN 55905 USA. MaxQuant was used for the human peptide spectral matching using human and SARS-CoV-2 proteome database which we downloaded from the UniProt database (access date 13th October 2021). The individuals infected with SARS-CoV-2 viruses displayed a different proteome diversity from the different body sites we investigated. Overally, we identified 1809 proteins across the four sample types we compared. Urine and BALF samples had significantly more abundant SARS-CoV-2 proteins than the other body sites we compared. Urine samples had 257(33.7%) unique proteins, followed by nasopharyngeal with 250(32.8%) unique proteins. Gargle solution and BALF had 38(5%) and 73(9.6%) unique proteins respectively. Urine, gargle solution, nasopharyngeal, and bronchoalveolar lavage fluid samples have different protein diversity in individuals infected with SARS-CoV-2. Moreover, our data also demonstrated that a given body site is characterized by a unique set of proteins in SARS-CoV-2 seropositive individuals.


Introduction
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) emerged in Wuhan, China, in December 2019 and spread rapidly worldwide [1]. The SARS-CoV-2 is an enveloped RNA virus with a positive-sense, single-stranded RNA genome of approximately 30 kb (Coronaviridae Study Group of the International Committee on Taxonomy of Viruses, 2020). Coronavirus disease 2019 (COVID-19) [2] is the clinical syndrome associated with SARS-CoV-2 and is characterized by respiratory or gastrointestinal viral symptoms. COVID-19 may result in clinical features such as cardiovascular, neurological, thrombosis, and renal failure [3]. Approximately 269 million COVID-19 cases with over 5.3 million deaths were reported globally on 13-12-2021 (https://CoVid19.who.int/). Globally, the exact impact of SARS-CoV-2 infection is unknown; even though the confirmed number is stillon the upward trajectory. The number of cases seems to vary geographically. As of 13 th December 2021, the approximate number of cases per region was as follows America: 98 million, Europe: 91 million, Southeast Asia: 44 million, Eastern Mediterranean: 16 million, Western pacific: 10 million, and Africa: 6.5 million (https://CoVid19.who.int/). COVID-19 spreads from person to person through direct contact or encountering infected surfaces. When SARS-CoV-2 is inhaled, it enters the human host cells via angiotensin-converting enzyme 2 (ACE2) receptors [4]. Once the virus enters the human cells, it starts replicating, leading to population expansion within the cells [4]. While in the cells, it induces the local immune cells to start producing cytokines and chemokines, resulting in the attraction of other immune cells in the lung, which causes excessive tissue damage [5]. A growing body of evidence indicates that the SARS-CoV-2 virus is not confined to the human lungs. Still, it also affects the other body organs, such as the kidney, where it causes acute kidney injury (AKI) [6]. In other individuals infected with SARS-CoV-2, neurological, cardiovascular, and intestinal malfunctions have also been reported [3].
Proteomics has played a fundamental role in the surveillance of SARS-CoV-2 spread globally, drug target identification, vaccine designs, and the development of rapid diagnostic kits used in health facilities [7]. Proteomics has enabled the development of methods to detect SARS-CoV-2 infections to complement the genomics assays [8]. Proteomic profiling aims to identify the most regulated proteins following SARS-CoV-2 infection to identify the potential biomarkers. Still, it can also be used to understand the host-SARS-CoV-2 interactions, protein-protein interactions, post-translational modifications, proteome expression patterns, and the cellular localization of the proteins [7,9]. To date, functional and differential proteomics has enabled the generation of an enormous amount of information, leading to the identification and characterization of SARS-CoV-2 infection and the regulated pathways [10,11]. Different proteomic biospecimen has increased our understanding and the dynamics of the SARS-CoV-2 virus in our population. Ihling and associates used the proteome obtained from the gargle solution to develop the mass spectrometry identification method for SARS-CoV-2 identifications [8]. In predicting SARS-CoV-2 clinical outcome, urine has been used, and the motivation could be due to ease of collection, which makes it attractive for proteomic analysis [10].
On the other hand, Li and associates used urine to profile individuals with the SARS-CoV-2 infection [12]. Samples collected from the nasopharyngeal site, the point of SARS-CoV-2 entry, have been integral in characterizing the host response following SARS-CoV-2 infection [9]. Bronchoalveolar lavage fluid (BALF) has provided answers to poorly understood questions about the SARS-CoV-2 pathogenesis at the site of infection, the human lungs [11]. Using BALF, Zheng identified pathways involved in oxidative stress and the immunological responses as the main enriched pathways from the BALF proteome [11]. These studies have used biospecimen collected from different body sites, each focusing on a specific biospecimen.
To the best of our knowledge, no documented proteomics study has attempted to understand the proteome profile of the human nasopharyngeal, bronchoalveolar space, urine, and the gargle solutions from individuals with SARS-CoV-2 infection. Therefore, we hypothesized that the proteome profile of the infected body sites is different because these tissues are made up of different cell types. In this study, we sort to gain more insight into the proteomic profiles of human urine, BALF, gargle solutions, and nasopharyngeal proteome and assess how the proteome profile compares when the SARS-CoV-2 virus colonizes different body sites.

Materials and methods
This study analyzed publicly available data downloaded from the Protein Identification Database, PRIDE (https://www.ebi.ac.uk/pride/) repository. The data used in this analysis had the following identifiers: (i) PXD019423 (gargle solution, n = 3) [8], ii) IPX0002166000 (Urine, n = 15) [12], iii) IPX0002429000 (BALF, n = 5) [11], and (iv) PXD022889 (nasopharynx, n = 18) [9]. We acknowledge that the samples used in this study were processed in different laboratories worldwide, and the sample preparation protocols have been reported elsewhere in the respective publications [8][9][10][11]. Briefly, all the samples used in the analysis were obtained from the adult patients with confirmed SARS-CoV-2 infection using reverse-transcriptase and quantitative polymerase chain reaction (RT-qPCR). All samples from the four experiments the samples were centrifuged to remove the debris. Trypsin enzyme was then used to digest the proteome samples at 37˚C and 1% formic acid was used to terminate the digestion in all four sample preparation protocols. The tryptic peptides were then loaded onto the LC-MS/MS for measurements.

Bioinformatics data analysis
Raw data files were processed with MaxQuant version 1.6.10.43 [13] for protein and peptide identification using the Andromeda search engine and the combined UniProt [14] proteome for Homo sapiens (Proteome ID: UP000005640, 78120 entries, and SARS-CoV-2 Proteome ID: UP000464024, entries 17 both accessed on 13/10/2021). MaxQuant default parameter settings were used for the MS/MS database search, with carbamidomethylation of cysteine residues and acetylation of protein N-termini selected as fixed modification and oxidation of methionine as variable modification. The peptide spectral matches (PSMs) were filtered at a 1% false discovery rate (FDR), and the precursor mass tolerance was set at 20 ppm. Trypsin/P was selected as protease, and label-free quantitation (LFQ) was enabled. The samples from the different studies were processed together to ensure cross-normalization, making the proteome comparison more accurate. Reverse hits and common contaminants were removed from the data set prior to the downstream analysis.
We did further data processing, using the Bioconductor package 'Differential Enrichment analysis of Proteomics data version 1.2.0 [15] to do the proteomic differential analysis by comparing the body sites under investigation. The protein groups identified in 70% of patients of the total patient population and were supported by at least two unique peptides were retained for analysis. We used "MinProb" for imputation with a q-value cut-off of 0.01. For unsupervised clustering, principal component analysis was performed on the data after imputation. Proteins differentially expressed after the challenge were identified using the limma function, including Benjamin-Hochberg multiple testing corrections. Proteins were considered differentially expressed if they survived a log2(x) fold change of 2 (as indicated) and an adjusted p-value of 0.05. Volcano Plots were visualized using the 'enhancedVolcanoplot' package in the Bioconductor package.

Results
A recent study using multi-omics approaches such as proteomics, transcriptomes phosphoproteome, and ubiquitinome demonstrated that SARS-CoV-2 infections cause perturbations of the host upon infection at different omics levels [16]. Following SARS-CoV-2 infections in human hosts, it has been demonstrated that it affects different body sites such as epithelium layers [17], kidneys [6], enterocytes [18], and lung injuries [19]. Thus, we wondered if the proteome profile from different body sites has the same proteome profile or differences in protein composition post-SARS-CoV-2 infection in humans. In our study, we investigated the proteome profile of the body sites mentioned above from individuals with confirmed SARS-CoV-2 infection, which were deposited in the PRIDE repository [20]. We hypothesized that the host proteome profile in the different body sites is the same.

Proteome samples clustered according to body site
The study reveals that the proteomes secreted from the different body sites have different proteon profiles and composition [Fig 1]. Overall,1809 proteins were detected across the four different sample types we compared after removing potential contaminants, reverse proteins, and the proteins identified only by sites and the one-hit wonders. The findings show that the urine samples clustered closely together following principal component analysis (PCA) [21], suggesting that the urine proteome is less diverse because the heterogeneity in composition was not observed. The lack of diversity was also evident in the BALF and indicated that there is a coordinated protein secretion in the lungs of the individuals infected with the SARS-CoV-2 virus. The gargle showed heterogeneity in proteome composition and indication that the gargle solution has a relatively diverse proteome profile in the SARS-CoV-2 seropositive samples. Interestingly, the nasopharyngeal samples demonstrated a high diversity and heterogeneity in proteome composition compared with the data obtained from the other body sites [Fig 1]. Our analysis of proteins obtained from the individuals with confirmed SARS-CoV-2 infections shows that the different body sites respond differently to SARS-CoV-2 antigens.

Body sites are characterized by different proteins abundance post-SARS-CoV-2 infection
We investigated the protein abundance across the four different samples. Our data demonstrate a clear clustering of individuals into two main clusters [Fig 2]. The difference in the cluster shows that the proteome profiles of the different body sites are the same; their abundances differ [Fig 2]. Using K-means clustering of proteins, we identified six main protein clusters. Cluster 1 proteins were less abundant in the urine and nasopharynx samples.
In contrast, the cluster 1 proteins were more abundant in the bronchoalveolar lavage fluid and gargle solution samples [Fig 2]. Proteins in clusters 2,3, 5, and 6 were more abundant in the urine samples. Interestingly, these proteins (in clusters 2,3,4, and 6) were less abundant in the bronchoalveolar lavage fluid, gargle solution, and nasopharynx samples [Fig 2]. Proteins in cluster 4 were more abundant in the urine, bronchoalveolar fluid, and gargle solution, and these proteins were less abundant in the nasopharynx samples [Fig 2].

A unique set of proteins is dominating human body sites during SARS-CoV-2 infection
The union analysis using Venny 2.1 was conducted to determine the unique and overlapping proteins from the four different body sites we compared [Fig 3]. Urine samples had 257 (33.7%) proteins unique to that body site. We identified 250 (32.8%) proteins uniquely identified in the nasopharynx protein samples. The gargle solution was characterized by a low The log 2 centered intensity shows the level of protein abundance with "red" representing more abundant proteins and "blue" represents less abundant proteins. The patient groups are shown in the key to the right.

PLOS ONE
SARS-CoV-2-positive patients display considerable differences in proteome diversity number of unique identified proteins; 38 (5%) of the identified proteins were unique. Following our analysis, the BALF had 73 (9.6%) of the identified proteins. Our data shows that the less diverse samples, urine, BALF, and gargle solutions, have different unique proteins in SARS-CoV-2 infected individuals.

Different body sites have a different set of regulated proteins post-SARS-CoV-2 infection in the human host
We compared the different body sites to identify the regulated proteins in the various body sites post-SARS-CoV-2 infections in humans [see Fig 4A-4F]. There were 101 upregulated proteins in the gargle solution and 97 downregulated proteins in the BALF when the BALF proteome profile was compared with the gargle solution [ Fig 4A]. Comparing BALF and nasopharynx proteome profiles, 441 proteins were upregulated in the nasopharynx, and 138 were downregulated in the BALF [ Fig 4B]. There were 331 upregulated proteins in the urine samples compared with the 118 proteins downregulated in the BALF when the BALF proteome

PLOS ONE
SARS-CoV-2-positive patients display considerable differences in proteome diversity was compared with the urine samples [ Fig 4C]. Comparing gargle solution and nasopharynx data, we identified 52 significantly upregulated proteins in the nasopharynx and 104 ones downregulated in the gargle solution [ Fig 4D]. We then compared gargle solution and the urine samples, the two distant body sites, to identify the regulated proteins. Interestingly, 144 and 95 proteins were upregulated in urine and gargle solutions [ Fig 4E]. Finally, we compared the nasopharynx and the urine samples, and 166 proteins were upregulated in the urine samples compared with one downregulated protein in the nasopharynx samples [ Fig 4F].

Discussion
Proteomics analysis of SARS-CoV-2 data has been used to identify the potential therapeutic targets in human hosts [22], a practical approach to combating the control and spread of SARS-CoV-2 in our population. The development of testing kits has been made possible due to the use of proteomic data to study and understand SARS-CoV-2 proteomic biomarkers [23,24]. It has also been effective in identifying variants with multiple mutations at the immunodominant spike protein that facilitates viral cell entry through the angiotensin converting enzyme 2 (ACE2) receptor [25]. This study describes the proteomic profile of the nasopharynx, gargle solution, urine, and bronchoalveolar lavage fluid obtained from individuals with confirmed SARS-CoV-2 infections [8,9,11,26]. The SARS-CoV-2 infection was confirmed by reverse transcriptase and quantitative polymerase chain reaction (RT-PCR). Principal component analysis (PCA) [27] reveals a differential diversity of proteins in the investigated body sites. The urine, BALF, and gargle solution proteome profiles demonstrated low diversity, while the nasopharynx proteome data showed a high diversity since they did not cluster together in space. The difference in the proteome diversity can be attributed to the fact that SARS-CoV-2 affects the different body sites differently, as was demonstrated by Feng et al. 2020 [28]. The proteomic data obtained from the nasopharynx demonstrated a high diversity, and this could be explained in part due to heterogeneity of "angiotensin-converting enzyme 2 (ACE2) expression and tissue susceptibility to SARS-CoV-2 infection" [29]. Another body of evidence shows that the high diversity of proteomes in the nasopharynx samples could be attributed to the impact of the virus on microbiome [30].
The protein abundance was different in different body sites that we compared. Most proteins in clusters 1,3,4,5, and 6 [ Fig 2] were more abundant in the urine samples than in the nasopharynx, gargle solution, and the BALF. The more abundant proteins in the urine samples were also detected in the BALF and gargle solution in clusters 1 and 4 [ Fig 2]. Chavan et al. 2021 [31] demonstrated that the urine proteome was more differentially expressed in the SARS-CoV-2 cases than in the negative control. The difference in abundance and diversity of the proteome profile can be due to the difference in the SARS-CoV-2 protein source as demonstrated in our data. In the urine samples, the nucleocapsid protein is the predominant source of protein hence the lack of proteome diversity in this body site [31].
The union analysis reveals a unique set of proteins that characterize human hosts different body sites post-SARS-CoV-2 infections. On the other hand, there were overlaps of the identified proteins from the different body sites with various percentages. Urine samples had the largest number of unique proteins (n = 257; 33.7%), followed by the nasopharynx (n = 250; 33.8%). The gargle solution had the lowest number of the identified proteins even though the SARS-CoV-2 peptides could still be identified in the gargle solution, making it an important alternative source of samples for SARS-CoV-2 testing. Urine and gargle solution samples can identify and characterize the SARS-CoV-2 virus since they are less invasive and easy to obtain, unlike nasopharyngeal samples. Form the venn analysis, we demonstrated that a unique set of proteins are accumulated in different body sites following the SARS-CoV-2 infection in humans.
Urine, gargle solution, BALF, and nasopharyngeal samples demonstrated the regulated proteins' differences. These differences need to be elucidated, and their clinical relevance needs further investigation. We hypothesize that this significantly regulated protein difference could hamper drug development. The clinical trials should factor the multi-organ comparisons of the proteome profiles of the individuals infected with the SARS-CoV-2 virus in the design and development of anti-SARS-CoV-2 drugs.
Our analysis is more likely to inform the clinical trials and management of SARS-CoV-2 infections in our societies. Using union analysis (Venn diagram), we demonstrated for the first time the potential global impact of SARS-CoV-2 infection in different body sites of the human host. We identified the common and unique proteins which characterize, and we opine that the finding will inform the clinical trials focusing on protein biomarkers. Since SARS-CoV-2 is a novel virus, the test kits need further improvements, and the conclusions of this study can inform the improvement of the test kits currently being used in hour health facilities [31]. The protein data can help us understand the damage caused in the distant body organs due to SARS-CoV-2 infection in the human body. Another plausible explanation could be that the protein data is helping in understanding the pathophysiology of SARS-CoV-2 infection and the differential response to infection mounted in different body sites. The findings in this analysis add another layer of information on the impact of SARS-CoV-2 infection in the different body sites of the human host. We suggest further studies using protein microarrays to further help in understanding the protein expression patterns in the body sites post-SARS-CoV-2 infections in humans, as it will aid the development of medical interventions which can be helpful in the management and treatment of SARS-CoV-2 [22]. The difference in the protein profiles following SARS-CoV-2 infection stands a chance of informing the clinical trials using proteins as potential biomarkers because different protein types characterize the different body sites. Selection of a given protein biomarker should depend on the body site from which the biospecimen is collected in test kit developments.
In conclusion, we demonstrated that the different body sites have different protein diversity in individuals with confirmed RT-qPCR SARS-CoV-2 infection. This study is a proof-of-concept study demonstrating that for the effective design and development of SARS-CoV-2 antiviral drugs, the protein profiles of the different body sites must be considered. The finding in this study could have a direct implication on performing population-wide effects of SARS-CoV-2 infections in different body sites.
This study had some limitations. We acknowledge the difference in the sample preparation protocols, which could be a potential confounder. This being a secondary data analysis; we did not have sufficient study participants' information, and we acknowledge this because it can contribute to the inaccurate interpretation of the results.